![]() METHOD AND SYSTEM FOR THE IDENTIFICATION OF COMPOUNDS IN COMPLEX BIOLOGICAL OR ENVIRONMENTAL SAMPLES
专利摘要:
Method and system for the identification of compounds in complex biological or environmental samples. The method comprises: receiving (102) a mass spectrum (1) from a mass spectrometry coupled to a separation technique; for each point (2) of the mass spectrum (1), write down (106) in an annotation database (12) combinations of formulas and adducts whose theoretical mass-load ratio (m/z)T corresponds to the measured mass charge ratio (m/z) of point (2); for each formula and annotated adduct, detect (108) regions of interest in a retention time range (RT0 -RT1) according to some characterization criteria; generate (110) an include list (14) with the retention time ranges (RT0 -RT1) and the theoretical load mass ratios (m/z)T of the formulas and adducts associated with the regions of interest; and sending (112) the inclusion list to a mass spectrometer for the identification of compounds in the sample by tandem mass spectrometry. (Machine-translation by Google Translate, not legally binding) 公开号:ES2767375A1 申请号:ES202030061 申请日:2020-01-24 公开日:2020-06-17 发明作者:Bertomeu Roger Giné;Torrado Oscar Yanes;Tomás Jordi Capellades 申请人:Fundacio Inst Dinvestigacio Sanitaria Pere Virgili;Centro de Investigacion Biomedica en Red de Enfermedades Hepaticas y Digestivas CIBEREHD;Universitat Rovira i Virgili URV;Consorcio Centro de Investigacion Biomedica en Red MP; IPC主号:
专利说明:
[0001] METHOD AND SYSTEM FOR THE IDENTIFICATION OF COMPOUNDS IN SAMPLES [0003] Field of the Invention [0004] The present invention falls within the field of metabolomics, which is characterized by the analysis of metabolites and small organic molecules in complex biological or environmental samples, such as plasma, urine, tissues, and wastewater. [0006] Background of the Invention [0007] Modern very high resolution mass spectrometers (> 40,000 FWHM) and mass accuracy (<1-5ppm), called HRMS (stands for "High Resolution Mass Spectrometry"), perform mass scans (in MS1 or "full scan" mode) ”) Very quickly (in a few milliseconds) in order to analyze ions originated by ionized compounds from a complex biological or environmental sample. [0009] When a high resolution mass spectrometer is coupled to a separation technique (known as “hyphenated MS”), for example liquid chromatography (LC-HRMS, “Liquid Chromatography-High Resolution Mass Spectrometry”) or capillary electrophoresis (CE -HRMS, "Capillary Electrophoresis-High Resolution Mass Spectrometry"), for undirected metabolomics experiments, the raw data matrix may contain tens or hundreds of thousands of points (commonly known in English as "scans" or "data points") in the case of complex samples. [0011] However, so far the annotation and identification of metabolites by LC-HRMS or CE-HRMS in non-targeted metabolomics studies is complicated, and the number of metabolites identified is quite limited. The present invention proposes a new procedure that allows identifying many more (and even all) of the ionized compounds in the biological sample, thereby increasing the coverage of possible detected biomarkers. [0013] Description of the Invention [0014] The present invention relates to a method for coupling-based non-targeted metabolomics data analysis of mass spectrometry with a separation, for example from liquid chromatography (LC-MS) or capillary electrophoresis (CE-MS). In mass spectrometry, different ionization methods can be used to produce ions, such as electrospray ionization (ESI) or chemical atmospheric pressure ionization (APCI). The analyzed sample can be a biological sample (plasma, tissues, etc.) or a complex environmental sample (eg wastewater). [0016] According to a first aspect of the present invention, a method for the identification of compounds in complex biological or environmental samples is presented. The method comprises the following stages: [0017] - Receive a mass spectrum from a mass spectrometry analysis coupled to a separation technique applied to a sample, where the mass spectrum comprises a plurality of points with information on retention time, measured mass-charge ratio and signal intensity measure. - Consult a database of molecular formulas that includes the theoretical mass charge ratio of the molecular ion of a plurality of molecular formulas and ionization adducts. [0018] - For each point of the mass spectrum, record in a database of annotations the combinations of molecular formulas and ionization adducts whose theoretical mass-charge ratio corresponds to the measured mass-charge ratio of said point considering a determined mass error, where each Annotation includes the retention time and the measured signal intensity of the point. [0019] - For each molecular formula and ionization adduct annotated in the annotation database, detect defined regions of interest in a retention time range where the annotated points meet some characterization criteria. - Generate an inclusion list that includes the retention time ranges of the detected regions of interest and the theoretical mass charge relationships of the molecular formulas and ionization adducts associated with each of the regions of interest. [0020] - Send the inclusion list to a mass spectrometer for the identification of compounds in the sample using tandem mass spectrometry. [0022] The method may comprise a detection step in the mass spectrum of isotopologists associated with the molecular formulas and / or annotated ionization adducts. The detection of isotopologists includes: [0023] - Search, in the retention time range of each region of interest, points of the mass spectrum whose measured mass-charge ratio corresponds, considering a mass error, with a theoretical mass-charge ratio of an isotopologist of the molecular formula and / or ionization adduct associated with the region of interest. [0024] - Obtain the signal intensity measured from the points found. [0025] - Calculate a theoretical intensity of the points found from the measured signal intensity of the points in the region of interest corresponding to the molecular formula and / or ionization adduct. [0026] - Compare the measured intensities with the calculated theoretical intensities. - Determine the detection of the isotopologist based on said comparison. [0028] In one embodiment, detection of the regions of interest comprises determining candidate regions, defined over a retention time range with a minimum number of points and / or a minimum density of scored points; characterize the candidate regions (20), obtaining characterization parameters; and select as regions of interest those candidate regions whose characterization parameters meet certain characterization criteria. [0030] The characterization criteria used can be very diverse: [0031] - Calculate a slope of a linear regression of the points scored in the candidate regions, and check that the absolute value of the calculated slope is greater than a threshold slope. [0032] - Calculate an average intensity and / or a maximum intensity of the measured signal of the points scored in the candidate regions, and check that the average intensity and / or the maximum intensity calculated is higher than an average intensity and / or maximum threshold. [0033] - Calculate a range of signal intensity measured from the points scored in the candidate regions, the range of intensity being defined by a relationship between the maximum intensity and the minimum intensity in the candidate region, and check that the calculated range of intensity is above a threshold intensity range. [0034] - Calculate a signal / noise ratio between an intensity level associated with the points noted in the candidate region and an intensity level associated with the points of the mass spectrum located in an area surrounding the region candidate, and check that the calculated signal-to-noise ratio is higher than a threshold signal-to-noise ratio. The area surrounding the candidate region can be defined by a space delimited by: [0035] i. a mass charge ratio range including a mass charge ratio range corresponding to the candidate region, and [0036] ii. a retention time range that includes the retention time range for the candidate region. [0038] The method may comprise defining a set of molecular formulas based on the sample to be analyzed, defining ionization adducts associated with the molecular formulas, and generating the database of molecular formulas including, for each molecular formula and associated ionization adduct, the theoretical load mass ratio. [0040] The method may comprise performing a mass spectrometry analysis coupled to a separation technique applied to the sample to obtain the mass spectrum. [0042] The method may comprise performing a tandem mass spectrometry analysis using the information included in the inclusion list to identify compounds in the sample. [0044] A second aspect of the present invention relates to a system for the identification of compounds in complex biological or environmental samples. The system comprises a control unit with data processing means configured to execute the steps of the previously defined method. [0046] The system may comprise a mass spectrometer commissioned to perform mass spectrometric analysis coupled with a separation technique on the sample to obtain the mass spectrum. [0048] The system may comprise a mass spectrometer commissioned to perform a tandem mass spectrometry analysis using the information included in the inclusion list to identify compounds in the sample. [0050] The present invention also relates to a program product for the identification of compounds in complex biological or environmental samples. The program product It comprises program instructions to carry out the previously defined method when the program is run on a processor. The program product may comprise at least one computer readable storage medium that stores the program instructions. [0052] Brief description of the drawings [0053] A series of drawings that help to better understand the invention and that expressly relate to an embodiment of said invention that is presented as a non-limiting example thereof, is now described very briefly. [0055] Figure 1 depicts a mass spectrum acquired by a mass spectrometer coupled to a separation technique. [0057] Figures 2A and 2B illustrate, according to the state of the art, the detection of the region of interest and spectral peaks in the annotation process of the mass spectrum. [0059] Figures 3A, 3B and 3C represent, according to the state of the art, the grouping of spectral peaks in the annotation process of the mass spectrum. [0061] Figures 4A and 4B represent, respectively, the elution forms of adenosine triphosphate and S-adenosyl methionine. [0063] Figure 5 depicts a flow chart of one embodiment of the method of the present invention. [0065] Figures 6A and 6B represent the number of overlaps of formulas and adducts annotated for the same point of the mass spectrum considering a mass error of 1 ppm and 5 ppm, respectively, of the mass spectrometer. [0067] Figure 7 illustrates a flow chart of the region of interest detection process according to one embodiment. [0069] Figure 8 shows an example of determination and characterization of candidate regions. [0071] Figure 9 represents the characterization of the candidate regions according to different criteria. [0073] Figure 10 illustrates the area surrounding the candidate region used to determine a criterion for characterizing the candidate regions (signal-to-noise ratio). [0075] Figure 11 illustrates a flow chart of the isotopologist detection process according to an embodiment of the present invention. [0077] Figures 12A and 12B show two examples of the effects of mass spectrometer resolution on possible detection by isotopologists. [0079] Figure 13 represents an example showing the real and theoretical isotopologist pattern (M1, M2) of a specific formula (M0). [0081] Detailed description of the invention [0082] Points 2 of a mass spectrum 1 acquired in MS1 mode on a mass spectrometer coupled to a separation technique (eg, liquid chromatography coupled to a mass spectrometry, LC-MS, or capillary electrophoresis coupled to a mass spectrometry, CE-MS) contains, as represented in the graph in Figure 1 , three axes of information: mass / charge of detected ions (m / z), intensity (proportional to the abundance of detected ions), and elution time or RT (“Retention Time”). Each point 2 ("scan" or "data point") of mass spectrum 1 normally contains information in a fairly wide m / z load mass range (for example, from m / z 100 to 1,000) for a given instant of time , being able to have up to thousands of measurements (depending on the resolution of the equipment) of relations mass load m / z. [0084] Currently, the mass spectrum 1 annotation in MS1 mode (MS1 annotation) follows the following scheme: [0086] 1) An algorithm (eg, CentWave) is used for the detection of regions of interest 3 (ROI, “Regions of Interest”) in the raw data that applies a continuous wavelet transformation and the Gauss adjustment in the separation domain chromatographic, or any other separation technique coupled to HRMS (on the horizontal axis the retention time RT and on the vertical axis the intensity of the measured signal, as represented in Figure 2A ), to detect spectral peaks 4 in the entire mass spectrum 1, for different values of mass charge (m / z) and retention time (RT) ratios, as shown in Figure 2B . [0088] 2) Subsequently, another algorithm (eg, CAMERA, CliqueMS) groups the spectral peaks 4 belonging to the same compound due to the redundancy of adducts and isotopes ( Figure 3A ). The source fragments shown in Figure 3A are mainly due to the loss of water, that is: [M-H2O + H] + in positive ionization, or [M-H2O-H] - in negative ionization. Grouping of spectral peaks 4 can be performed by correlating the shape of the peak ( Figure 3B ), in which a high correlation of the shape of the peaks is sought. Spectral peaks 4 showing weak correlation are not pooled. Clustering can also be done by correlating peak abundance or intensity using different samples. The example in Figure 3C shows an almost constant ratio between the peak intensity of the mass / load ratio A and the peak intensity of the mass / load ratio C (coefficient of determination of linear regression R2 = 0.98). Similarly, there is a strong correlation between the mass / charge ratio B and the mass / charge ratio D (R2 = 0.92). However, there is no correlation between the mass / load ratios A and B (R2 = 0.03). However, this procedure has serious limitations when the elution forms of the metabolites do not fit the function (eg, Gaussian) that is intended to fit the data, which occurs for example with adenosine triphosphate (ATP, Figure 4A ) or S-adenosyl methionine (SAM, Figure 4B ). [0090] Once the MS1 annotation is completed, for the characterization or identification of metabolites, an MS2 annotation is performed using tandem mass spectrometry or MSn (n> 2). There are currently three procedures for the identification of metabolites by LC-MS / CE-MS (or LC-HRMS / CE-HRMS) and MSn in undirected metabolomics: [0092] - Inclusion list (directed MS / MS): The samples are analyzed in MS1 mode and the data is processed by one or more software programs that detect and align peaks (as explained in Figure 3A). Normally following criteria of statistical changes between groups or experimental conditions, a proportion of these m / z are fragmented by MS2 or MSn analysis in a later experiment. [0093] - Data dependent acquisition (DDA): The mass spectrometer collects data from MS1 and MSn in the same undirected metabolomic analysis. A short duty cycle of MS1 recognition of the m / z currently eluting serves to control the intensity of the m / z and to identify / select potential m / z to fragment. Then, "n" cycles of MS2 or MSn are applied, during each of which a single m / z precursor is isolated, fragmented and its fragments detected. The precursors are fragmented in decreasing order of intensity. Typically a dynamic drop window is used to ensure that m / z that has been recently scanned by MS2 are not constantly fragmented again if new m / z are available. [0095] - Independent Data Acquisition (DIA): The mass spectrometer collects MS1 and MSn data in the same undirected metabolomic analysis. It is a method of determining molecular structures in which all the ions within a selected m / z range are fragmented and the mixture of fragments is detected. Mass spectra in MSn are acquired either by fragmenting all ions entering the mass spectrometer at any given time (broadband DIA) or by sequentially isolating and fragmenting all ions in m / z ranges (SWATH ™, Sequential windowed acquisition of all theoretical mass spectra). [0097] The present invention consists of a new procedure to process raw data from an LC-HRMS or CE-HRMS analysis in MS1 mode and select mass / load ratios (m / z) and retention time ranges (RT) for the identification of metabolites in a subsequent analysis performed using tandem mass spectrometry or MSn (n> 2). [0099] The method 100 of the present invention comprises the steps shown in the flow diagram of Figure 5 . [0101] First, method 100 comprises receiving 102 a mass spectrum 1 from an LC-MS or CE-MS analysis (acquired in MS1 mode) applied to a biological or environmental sample. The mass spectrum 1 comprises a plurality of points 2 with information including the retention time (RT), the measured load mass ratio (m / z) and the measured signal intensity. [0102] Next, a database of molecular formulas 10 that includes the theoretical charge mass ratio (m / z) T of the molecular ion of a plurality of molecular formulas and associated ionization adducts is accessed or queried 104. In one embodiment, the molecular formula database 10 comprises a list of formulas and a list of adducts, with the theoretical loading mass of each formula and each adduct, such that the theoretical loading mass of the combinations of molecular formulas and Ionization adducts can then be calculated from the monoisotopic mass of the molecular formula plus the difference in mass provided by the ionization adduct when loaded at the source (eg, H, Na, K). In another embodiment, the molecular formula database 10 directly stores the theoretical charge mass of the various formula and adduct combinations, so no further calculation is necessary. [0104] The content of the database of molecular formulas 10, or the information accessed in query 104, is preferably oriented to the particular sample to be analyzed, based on a large universe or space of molecular formulas related to the matrix that it will be analyzed (serum, urine, cells, environmental samples, etc.). In the case of biological matrices of biomedical interest, the molecular formulas included in the Human Metabolome Database (HMDB) can be used. For example, a set of molecular formulas can be defined based on the sample to be analyzed and ionization adducts whose association to the molecular formulas is known. It can be considered databases that include only the molecular formulas oriented to the particular sample (for example, with the formulas expected to be found in blood plasma), or larger databases, such as the HMDB database that includes information on more than 10,000 metabolites found in the human body. [0106] Once the molecular formulas and their ionization adducts are defined, the content of the database of molecular formulas 10 can be generated including, for each molecular ion of the molecular formula and for each associated ionization adduct, the theoretical mass charge ratio (m / z) T, which can be obtained directly from the molecular formula considering the corresponding atomic weights. The method may comprise the step of generating the molecular formula database 10. Alternatively, the molecular formula database 10 may have already been created prior to the implementation of method 100, so that method 100 only requires access to a memory (eg, on a local device or in the cloud) in which the previously generated molecular formula database 10 is stored. [0107] The construction of the database of molecular formulas 10 can comprise the generation of a table that contains all the theoretical mass-load relationships (m / z) T after considering the main isotopologists (eg, M1, M2, M3) and the known adducts in both positive and negative ionization (fragments in the source can be considered as an adduct in the list of adducts) for each unique molecular formula considered. The information contained in the molecular formula database 10 can for example be structured in table form, where a different formula / adduct / isotopologist is included in each row. The table can be ordered by the theoretical mass load ratio (m / z) T, the first column, as represented in the following example: [0112] The procedure searches for all the theoretical values of the mass charge ratio (m / z) T at each point 2 of the LC-MS or CE-MS mass spectrum 1, within a predefined error (usually between 1 and 5 ppm). Alternatively, a sweep is made at points 2 of the mass spectrum 1 and it is checked, for each point 2, whether its measured mass-load ratio (m / z) corresponds to any theoretical mass-load ratio (m / z) T of the Molecular Formula Database 10. For ease of search, the Molecular Formula Database 10 may include data ordered from lowest to highest theoretical mass charge ratio (m / z) T. [0114] For each point 2 of the mass spectrum 1, 106 the molecular formulas and the ionization adducts whose theoretical charge mass ratio (m / z) T correspond to the measured mass charge ratio (m / z) of said point, considering a certain margin or mass error (resulting from the accuracy of the measurement or calibration of the mass spectrometer). The annotation database 12 includes, for each molecular formula and annotated ionization adduct, the retention time (RT) and the measured signal intensity of the point associated with the formula / adduct. The information contained in the annotation database 12 can be structured, for example, in the form of a table, where a different annotation is included in each row. Each row will therefore be a new annotation that will include the formula and / or annotated adduct, its corresponding retention time (RT), the intensity of the measured signal from point 2 of the associated mass spectrum 1 and, optionally, the mass charge ratio. measure (m / z). [0117] The different retention time (RT) and intensity annotations made in the annotation database 12 for the same formula and adduct in different rows of the table can be grouped (and even represented in a graph, such as the one shown in Figure 8 for the NH4 adduct of the formula C21H27NO4) for the subsequent detection analysis of regions of interest. [0119] According to the defined mass error, there will be a greater or less overlap of possible formulas and / or adducts annotated for the same point 2. In the graph of Figure 6A , the number of overlaps produced is represented on the horizontal axis, considering an error mass of 1 ppm, for the different load mass relationships (m / z) of the points of the mass spectrum 1 (from 0 overlaps to 7 overlaps), and on the vertical axis the number of occurrences for each different overlap number . For example, the mass / charge ratio (m / z) measured at a point 2 of mass spectrum 1 may correspond to two different formulas / adducts, considering the mass error of 1 ppm: the negative ionization adduct -H-NH3 of the molecular formula C6H9NO2, and the negative -H ion of the molecular formula C6H6O2. There is therefore an overlap between two formulas and / or adducts. When the overlap occurs between N formulas and / or adducts, N-1 overlaps are considered to exist. Of the nearly 100,000 points 2 of the mass spectrum 1 in the example in Figure 6A, at more than 40,000 points there are no overlaps, at more than 20,000 points there is 1 overlap and at more than 10,000 points two overlaps occur. As the mass error increases, the overlap between different possible formulas-adducts increases ( Figure 6B shows the overlap with a mass error of 5 ppm). [0121] Next, once annotation 106 has been made, we proceed to analyze each molecular formula and ionization adduct from the annotation database 12, grouping all the annotations that occurred for the same formula / adduct (see example in Figure 8), in order to detect 108 regions of interest defined in a retention time range (RT 0 -RT 1 ) in which the scored points meet certain characterization criteria. The same formula / adduct from the annotation database 12 can include a single region of interest or several regions of interest detected in different retention time ranges. [0122] Method 100 implements an algorithm to find regions of interest based on the verification of one or more characterization criteria, first considering a criterion of minimum density and / or minimum number of points in the region of interest (which will determine candidate regions), and then considering additional criteria, such as a minimum slope of the points in the region of interest or a certain minimum signal-to-noise ratio. Detected regions of interest can also be optionally, but recommended, compared to a sample blank to rule out false positives or exogenous points to the sample. [0124] Therefore, and unlike the state of the art, the determination of the regions of interest does not consist in finding peaks in the mass spectrum 1 fitting a model (eg Gaussian) to the data. The focus of the new procedure is independent of the shape and determination of the spectral peaks 4, and it is not necessary to make any kind of correlation between the spectral peaks (as shown in Figures 3B and 3C for the state of the art). This renders the process of the present invention independent of chromatographic conditions. [0126] Figure 7 shows a flow chart of the detection region 108 process of interest according to one embodiment. Detection 108 of regions of interest comprises determining 122, for each molecular formula and ionization adduct from the annotation database 12, candidate regions 20 defined in a retention time range (RTc 0 -RTc 1 ), as represented in the example in Figure 8 , with a minimum number of points and / or a minimum density of points scored in candidate region 20. This step corresponds to filtering by point density, only considering possible regions of interest (ie regions 20) candidates for those time windows that group a minimum number of points and / or a minimum density of points. The example shown in Figure 8 represents points 2 of the mass spectrum 1 noted in the annotation database 12 for the [M + NH4] + adduct of the molecular formula C21H27NO4. Candidate regions 20 that have exceeded density filtering are also shown; for example, the time ranges that collect at least five points 2 of the mass spectrum 1 in a given maximum time range are selected as candidate regions 20. [0127] Next, 124 candidate regions 20 are characterized, obtaining characterization parameters 22 of candidate regions 20. Finally, 126 the characterization parameters 22 obtained with some characterization criteria are compared, and 128 regions of interest are selected as those regions candidates 20 whose characterization parameters 22 meet certain characterization criteria. [0129] Figure 9 shows different ways of characterizing 124 the candidate regions 20 and different characterization criteria that can be considered must be met by the candidate regions 20. For example, any combination of the following characterization criteria can be considered, among others. : [0131] - A minimum slope of the points 2 in the candidate region 20: The characterization of the candidate regions 20 can comprise calculating 132 the slope (m) of a linear regression 24 (see Figure 8) of the points 2 noted in the candidate regions 20 Characterization criteria may include checking 142 that the absolute value of the calculated slope is greater than a minimum slope (mmin) or threshold slope. [0133] - An average and / or maximum intensity of the signal measured in the candidate region: The characterization of the candidate regions may comprise calculating 134 an average intensity (Imed) and / or calculating 136 a maximum intensity (Imax) of the measured signal of the 2 points scored in candidate regions 20. Characterization criteria may include checking 144 that the calculated mean intensity (Imed) is greater than a mean threshold intensity (ImedTH) and / or checking 146 that the calculated maximum intensity (Imax) is greater at a maximum threshold intensity (ImaxTH). [0135] - A range of intensity in the candidate region: The characterization of the candidate regions may comprise calculating 138 a range of signal intensity measured from the points scored in the candidate regions, where the range of intensity is defined by a relationship between the intensity maximum and minimum intensity in the candidate region (eg a logarithmic relationship between the maximum intensity value, Imax, and the minimum intensity value, Imin, of the 2 points noted in the candidate region 20). Characterization criteria may include checking 148 that the calculated intensity range is greater than a threshold intensity range. [0136] - A minimum signal / noise ratio (SNR): The characterization of the candidate regions may comprise calculating 140 a signal / noise ratio (SNR) between an intensity level associated with the points 2 noted in the candidate region 20 and an intensity level associated with points 2 of mass spectrum 1 located in an area surrounding candidate region 20. Characterization criteria may include checking 150 that the calculated signal-to-noise ratio (SNR) is greater than a threshold signal-to-noise ratio (SNRTH) ). According to the embodiment shown in Figure 10 , the surrounding area 26 to the candidate region 20 can be defined by a space delimited by a range of mass charge ratio (m / zP 0 -m / zP 1 ) that includes the range of mass charge ratio (m / zC 0 -m / za) corresponding to candidate region 20, and for a retention time range (RTp 0 -RTp 1 ) that includes the retention time range (RTc 0 -RTc 1 ) corresponding to candidate region 20, where said space may or may not include candidate region 20 itself (in Figure 10 the range m / zC0-m / zC1 of the candidate region is not to scale, has been expanded for illustrative purposes; in practice the m / zP0-m / zP1 range is much larger than the m / zC0-m / zC 1 range , even up to about 100,000 times larger). The candidate region can be considered to include a mass charge ratio range (m / zC0-m / za) since it is considered a mass error in the annotation in the annotation database 12. [0138] - A minimum and / or maximum width of the retention time range (RT 0 -RT 1 ) of the regions of interest (ie minimum and / or maximum time distance from the beginning to the end of the region (RT 0 -RT 1 )). [0140] However, it is possible to use other parameters or different characterization criteria. Furthermore, the characterization criteria can be coupled to machine learning techniques (artificial neural networks, random forests, etc.) to filter candidate regions 20 and generate a more specific inclusion list in exchange for applying a bias associated with the learning method itself. . [0142] In the example in Figure 8, the candidate region on the left is not selected as the region of interest because it does not meet the minimum slope criterion (| m | <mmin). The central candidate region is also not selected as the region of interest because the mean intensity (Imed) of its 2 points is less than a mean threshold intensity (ImedTH). Candidate region 20 on the right does select 128 as region of interest 28 because characterization parameters 22 meet the required characterization criteria (eg, | m |>mmin;Imed> ImedTH, etc.). In the represented case, the region of interest 28 coincides with the candidate region (RTco = RT 0 , RTc 1 = RT 1 ). However, the region of interest 28 ultimately considered may result from the grouping of other overlapping regions (eg, grouping of candidate regions or other overlapping regions of interest). [0144] Method 100 continues with the generation 110 of an annotated and very precise inclusion list 14, with variable time ranges according to the elution profile of each m / z, for MS / MS (or MSn) experiments that facilitates the identification of metabolites. Inclusion list 14 includes the retention time ranges (RT 0 -RT 1 ) of the detected regions of interest and the theoretical charge mass (m / z) T ratios of the molecular formulas and / or ionization adducts associated with each one of the detected regions of interest. Optionally, the inclusion list may also include the molecular formulas and / or ionization adducts associated with each of the detected regions of interest. [0146] Finally, inclusion list 14 is sent 112 to a mass spectrometer for tandem mass spectrometry analysis to identify metabolites in the sample using data from inclusion list 14. Optionally, the method can understand performing tandem mass spectrometry analysis using the information included in the inclusion list to identify metabolites in the sample. The MS / MS analyzes are subsequent to the MS1 mode mass scans performed in the LC-MS analysis, requiring a second injection of the same sample as there is currently no technology to accumulate or store ions after being detected in the MS1. [0148] The new procedure analyzes the points of the mass spectrum of a representative biological sample, acquired in MS1 mode, to select those mass charge m / z relationships (and their time ranges) that will be fragmented in subsequent MSn experiments. A novel aspect of the present invention is the way to select the mass load ratios (m / z) and the retention time ranges to perform the MSn analysis, since, not being based on peak detection, it is a method independent of the profile chromatographic elution of the compound, being able to detect metabolites with non-Gaussian elution forms or the like (as in Figures 4A and 4B). Furthermore, in the event that the molecular formulas and / or ionization adducts associated with the detected regions of interest are sent to the mass spectrometer, the mass spectrometer can use this information in post-fragmentation analyzes to more quickly identify compounds as part of a specific list of candidate formulas. [0150] Furthermore, the present invention presents a novel way of detecting isotopologists of molecular formulas and / or ionization adducts in the mass spectrum 1. Detection of isotopologists can be verified once 108 regions of interest 28 of the formulas have been detected. molecular and ionization adducts. The detection of isotopologists 120 comprises, as represented in the flowchart of Figure 11 , searching 162 in the retention time range (RT 0 -RT 1 ) of each region of interest 28 (or at least in one time interval included in said range RT 0 -RT 1 ), points 2 of the mass spectrum 1 whose measured mass-load ratio (m / z) corresponds, considering a mass error, with a theoretical mass-load ratio (m / z) T of an isotopologist (eg, M1) of the molecular formula and ionization adduct (M0) associated with the region of interest. In this way, it is verified that the mass charge ratio m / z matches the theoretical one of the isotopologist, considering a mass error of the spectrometer. [0152] Next, 164 the measured signal intensity of each of the points found in search 162 is obtained. 166 a theoretical intensity of the points found in search 162 is calculated from the intensity of the points in the region of interest (ie the points corresponding to the formula / main adduct M0), and depending on the theoretical ratio of abundance of the isotopologist in question (either M1, M2, etc.) expected to be found with respect to the formula or main adduct M0 . For example, if the theoretical ratio of abundance of an isotopologist M1 is 2.5% with respect to the formula / main adduct M0, the theoretical intensity of the isotopologist would be 2.5% of the intensity level of the points of the region of interest . 168 the measured intensities are compared with the calculated theoretical intensities and 170 is determined, based on said comparison, the detection or not of the isotopologist. In one embodiment, it is checked, for each of the points found, if the measured intensity of the point corresponds to the theoretical intensity of the isotopologist, considering a certain intensity margin (to contemplate, for example, possible sensitivity errors in the measurement or divergence with respect to to the theoretical abundance ratio of the isotopologist with respect to the formula / adduct M0). To calculate the theoretical intensity of the isotopologist, the intensity of the corresponding M0 ( Int ( M0)) is considered (ie the intensity of the signal measured from the point of the region of interest 28 at a corresponding time RT -in the same scan-) and the theoretical abundance ratio ( ratio) of the isotopologist with respect to M0. In comparison 168 of the Intensities measured with theoretical intensities are considered a range of intensity; for example, it is verified that the measured intensity of the isotopologist ( Int ( iso)) is included in a constructed interval (based on a k value) around the theoretical value ( Int ( MÜ) * ratio) that would correspond to the isotopologist: [0153] Int ( MÜ) * ratio * ( 1 + k)> Int ( iso)> Int ( MÜ) * ratio * ( 1-k) [0155] Then you can optionally perform an additional check based on the cosine similarity comparison, which is defined as: [0159] This check can be done in the following way: [0160] • Search the annotation database 12 for the entries that correspond to the conditions to be compared (eg, M0 compared to an isotopologist M1) in the RT interval that corresponds to the region of interest being analyzed corresponding to M0. [0161] • Search for all those inputs of each set that share the RT retention time (that is, that inputs of the two conditions M0 and M1 have been found in the same scan -i.e., same RT time instant-). [0162] • If there are enough inputs (eg more than 5, to avoid false positives when N is small), cosine similarity is calculated (let I = <¡1, i2, i3 ... iN> and J = <j1, j2, j3 ... jN> the vectors of the intensities of the two conditions to be compared): [0164] Cos = ( i1j1 i2j2 ... iNjN) / ( modulo ( I) * modulo ( J)) [0166] • If Cos> k (eg k = 0.99), then it is determined that an isotopologist has been found and it is recorded. [0168] Search 162 of points corresponding to an isotopologist of a given formula and adduct can be performed by consulting the annotation database 12, which may include annotations of the isotopologists (M1, M2, ...) in addition to the annotations of the formulas / adducts (M0). To do this, when an annotation 106 of a formula / adduct (M0) is made, the existence of a point with a mass-charge relation corresponding to an isotopologist (M1, M2, ...) and an intensity close to the theoretical, is verified, and in that case the annotation of the isotopologist is made. Alternatively, the 162 search for isotopologists directly in the mass spectrum 1 (since the time instant RT and the charge mass relation where we have to search are known). [0170] The search for isotopologists whose presence or absence must be determined for each formula and / or annotated adduct may be determined in the molecular formula database 10, which may include, for example, the isotopologists to consider for each formula and / or adduct. (for example, the main isotopologists M1 and M2 of each formula / adduct M0) and their corresponding theoretical mass / charge ratio (m / z) T. The molecular formula database 10 can also include the theoretical abundance ratio of the isotopologist. In one embodiment, the isotopologists that can be theoretically detected are determined based on the mass resolution of the spectrum in the range of mass charge ratio m / z analyzed, which allows adjusting for each M0 the space of isotopologists that the mass spectrometer it can detect depending on the resolution of the equipment. Information related to isotopologists can be included, for example, in a database of isotopologists, in which the composition of the isotopologists (M1, M2, ...) detectable with the mass spectrometer, the mass-load ratio m is stored. / z with respect to M0 and the abundance ratio. [0172] Therefore, the method allows calculating the isotopic pattern of each formula and differentiating which isotopologists are detectable by the device given the ratio of intensity to M0 and the resolution of the mass spectrometer. The procedure to determine if the calculated isotopologists' peaks are separable depends on the mass analyzer used (as explained for example in the document "Orbitrap Mass Spectrometry", Zubarev et al., Analytical Chemistry 2013, 85 (11), pp. 5288-5296). In the case of Orbitrap analyzers, the resolution is inversely proportional to the square root of the m / z, and can therefore be calculated mathematically. In the case of FTICR analyzers, the resolution has an inverse scale to m / z, so it can also be calculated mathematically. In contrast, the resolution in TOF analyzers is independent of the m / z, so the resolution of each m / z is calculated using a calibration curve. [0174] An example of isotopic patterns (M0, M1, and M2) of phenylalanine (C9H11N5O2) is shown in Figures 12A and 12B, explaining the effect of resolution to distinguish isotopologists and how higher resolution allows to distinguish other isotopologists who other than M1 and M2. Figure 12A corresponds to a 200000 resolution (Orbitrap type) and Figure 12B to a 60000 resolution (qTOF type). These figures represent: [0175] - Blue vertical lines: represent the theoretical mass charge m / z relationships (and their abundance), according to the relative abundance of each natural isotope of each atom. [0176] - Green curved lines: As a result of the fact that the teams, according to their resolution, cannot perfectly distinguish between the theoretical mass-load-m / z relationships (in their case, those detected), what is really appreciated in the mass spectrum is a curve that encompasses them (green curved line). Depending on the resolution, that curve defines blue vertical lines better or worse. [0177] - Red vertical lines: they are a simplification (a sum) of the green curved line, a way to avoid picking up all the points of the green curved line, and instead to include them in a single signal (called centroid). This value is a "weighted average" of the mass charge m / z relationships included within the green curve, and its abundance. [0179] In the example in Figure 12A, resolution 200000 (Orbitrap type) is sufficient to completely separate the M1 and M2 isotopologists. However, for the case shown in Figure 12B (resolution 60000 qTOF), the M1 isotopologists can be separated but the M2 isotopologists are not distinguished and cannot be separated. [0181] Figure 13 represents a real example where the pattern of isotopologists (M1, M2) of a particular M0 formula can be seen, and as follows approximately the calculated theoretical intensity ratio (dashed line). [0183] In the case of overlapping of several formula-adducts for a given mass charge ratio (m / z), the number of isotopologists associated with the same formula that have been detected by the method can be used to prioritize one candidate formula over another, providing relevant information on which compound can be treated before even performing tandem mass spectrometry.
权利要求:
Claims (16) [1] 1. A method for the identification of compounds in complex biological or environmental samples, characterized by comprising: receiving (102) a mass spectrum (1) from a mass spectrometry analysis coupled to a separation technique applied to a sample, where the mass spectrum (1) comprises a plurality of points (2) with time information of retention (RT), measured load mass ratio (m / z) and measured signal intensity; consult (104) a database of molecular formulas (10) that includes the theoretical charge mass (m / z) T ratio of the molecular ion of a plurality of molecular formulas and ionization adducts; for each point (2) of the mass spectrum (1), write down (106) in a database of annotations (12) the combinations of molecular formulas and ionization adducts whose theoretical charge mass ratio (m / z) T corresponds to the measured mass charge ratio (m / z) of said point (2) considering a certain mass error, where each entry includes the retention time (RT) and the measured signal intensity of the point (2); For each molecular formula and ionization adduct noted in the log database (12), detect (108) regions of interest defined in a retention time range (RT 0 -RT 1 ) where the scored points meet criteria for characterization; generate (110) an inclusion list (14) that includes the retention time ranges (RT 0 -RT 1 ) of the detected regions of interest and the theoretical mass charge ratios (m / z) T of the molecular formulas and ionization adducts associated with each of the regions of interest; and sending (112) the inclusion list to a mass spectrometer for the identification of compounds in the sample by tandem mass spectrometry. [2] 2. The method of claim 1, comprising detecting in the mass spectrum (1) isotopologists associated with the molecular formulas and / or annotated ionization adducts, where the detection of isotopologists comprises: search (162), in the retention time range (RT 0 -RT 1 ) of each region of interest (28), points (2) of the mass spectrum (1) whose measured mass-charge ratio (m / z) corresponds , considering a mass error, with a theoretical mass charge ratio (m / z) T of an isotopologist of the molecular formula and / or ionization adduct associated with the region of interest (28); obtain (164) the intensity of the measured signal of the points found; calculating (166) a theoretical intensity of the points found from the measured signal intensity of the points in the region of interest (28) corresponding to the molecular formula and / or ionization adduct; compare (168) the measured intensities with the calculated theoretical intensities; determine (170) the detection of the isotopologist based on said comparison. [3] 3. The method of any of the preceding claims, wherein the detection (108) of the regions of interest comprises: determining (122) candidate regions (20) defined in a retention time range (RTC 0 -RTC 1 ) with a minimum number of points and / or a minimum density of points scored; characterize (124) the candidate regions (20), obtaining characterization parameters (22); and select (128) as regions of interest those candidate regions (20) whose characterization parameters (22) meet certain characterization criteria. [4] The method of claim 3, wherein the characterization (124) of the candidate regions (20) comprises calculating (132) a slope (m) of a linear regression (24) of the points (2) noted in the candidate regions (twenty); and where the characterization criteria comprise checking (142) that the absolute value of the calculated slope ( m ) is greater than a threshold slope (mmin). [5] The method of any one of claims 3 to 4, wherein the characterization (124) of the candidate regions (20) comprises calculating (134, 136) an average intensity (Imed) and / or a maximum intensity (Imax) of the measured signal of the points (2) scored in the candidate regions (20); and where the characterization criteria include verifying (144, 146) that the calculated mean intensity (Imed) and / or the maximum intensity (Imax) is greater than a threshold mean intensity (ImedTH) and / or maximum (I maxTH). [6] The method of any one of claims 3 to 5, wherein the characterization (124) of the candidate regions (20) comprises calculating (138) a range of measured signal intensity of the points (2) noted in the candidate regions (20), the intensity range being defined by a relationship between the maximum intensity and the minimum intensity in the candidate region (20); and where the characterization criteria comprise verifying (148) that the calculated intensity range is greater than a threshold intensity range. [7] The method of any of claims 3 to 6, wherein the characterization (124) of the candidate regions (20) comprises calculating (140) a signal-to-noise ratio (SNR) between an intensity level associated with the points (2 ) annotated in the candidate region (20) and an intensity level associated with the points (2) of the mass spectrum (1) located in an area surrounding (26) the candidate region (20); and where the characterization criteria comprise checking (150) that the calculated signal-to-noise ratio (SNR) is higher than a threshold signal-to-noise ratio (SNRTH). [8] 8. The method of claim 7, wherein the area surrounding (26) the candidate region (20) is defined by a space delimited by a range of mass charge ratio (m / zP 0 -m / zP 1 ) that includes a mass charge ratio range (m / zC 0 -m / za) corresponding to the candidate region (20), and by a retention time range (RTp 0 -RTp 1 ) that includes the retention time range (RTc 0 -RTc 1 ) corresponding to the candidate region (20). [9] 9. The method of any of the preceding claims, comprising: - define a set of molecular formulas based on the sample to be analyzed; - define ionization adducts associated with molecular formulas; and - generate the database of molecular formulas (10) including, for each molecular formula and associated ionization adduct, the theoretical mass charge ratio (m / z) T. [10] 10. The method of any of the preceding claims, which comprises performing a mass spectrometry analysis coupled to a separation technique applied to the sample to obtain the mass spectrum (1). [11] 11. The method of any of the preceding claims, comprising performing a tandem mass spectrometry analysis using the information included in the inclusion list to identify compounds in the sample. [12] 12. A system for the identification of compounds in complex biological or environmental samples, characterized in that it comprises a control unit with data processing means configured to execute the steps of the method according to any of claims 1-11. [13] 13. The system of claim 12, comprising a mass spectrometer commissioned to perform mass spectrometric analysis coupled to a separation technique on the sample to obtain the mass spectrum (1). [14] 14. The system of any one of claims 12 to 13, comprising a mass spectrometer commissioned to perform a tandem mass spectrometry analysis using the information included in the inclusion list to identify compounds in the sample. [15] 15. A program product for the identification of compounds in complex biological or environmental samples, comprising program instructions for carrying out the method defined in any of claims 1-11 when the program is run on a processor. [16] 16. The program product according to claim 15, comprising at least one computer readable storage medium that stores the program instructions.
类似技术:
公开号 | 公开日 | 专利标题 US11222775B2|2022-01-11|Data independent acquisition of product ion spectra and reference spectra library matching US9395341B2|2016-07-19|Method of improving the resolution of compounds eluted from a chromatography device US9312110B2|2016-04-12|System and method for grouping precursor and fragment ions using selected ion chromatograms Kenar et al.2014|Automated label-free quantification of metabolites from liquid chromatography–mass spectrometry data Åberg et al.2009|The correspondence problem for metabonomics datasets AU2006210088A1|2006-08-10|Mass spectrometry analysis method and system ES2767375B2|2020-12-17|PROGRAM METHOD, SYSTEM AND PRODUCT FOR THE IDENTIFICATION OF COMPOUNDS IN COMPLEX BIOLOGICAL OR ENVIRONMENTAL SAMPLES JP4857000B2|2012-01-18|Mass spectrometry system Godfrey et al.2012|Accurate mass measurements and their appropriate use for reliable analyte identification US9009097B2|2015-04-14|Identification of substances by ion mobility spectrometry CN109696506A|2019-04-30|A method of for carrying out molecular recognition to sample WO2005106920A2|2005-11-10|Mass spectrometer Sun et al.2012|A systematic model of the LC-MS proteomics pipeline JPWO2019240289A1|2021-07-15|Methods and systems for identifying the structure of compounds Ivanova et al.2020|Stochastic Dynamic Mass Spectrometric Approach to Quantify Reserpine in Solution JP2018119897A|2018-08-02|Substance identification method using mass analysis and mass analysis data processing device US8515685B2|2013-08-20|Method of mass spectrometry, a mass spectrometer, and probabilistic method of clustering data JP2020073900A|2020-05-14|Data independent acquisition of product ion spectrum and reference spectral library matching Codrea et al.2007|Robust peak detection and alignment of nanoLC-FT mass spectrometry data Delaney2013|Evaluation of spectral library searching systems’ Weber2011|Increased confidence of metabolite identification in high-resolution mass spectra using prior biological and chemical knowledge-based approaches
同族专利:
公开号 | 公开日 WO2021148371A1|2021-07-29| ES2767375B2|2020-12-17|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题 EP2418481A1|2009-04-07|2012-02-15|Shimadzu Corporation|Method and apparatus for mass analysis data processing| US20130131998A1|2011-11-18|2013-05-23|David A. Wright|Methods and Apparatus for Identifying Mass Spectral Isotope Patterns| US20170338089A1|2016-05-23|2017-11-23|Thermo Finnigan Llc|Systems and Methods for Sample Comparison and Classification|WO2021148371A1|2020-01-24|2021-07-29|Consorcio Centro de Investigación Biomédica en Red, M.P.|Method and system for the identification of compounds in complex biological or environmental samples|US9847216B2|2015-08-14|2017-12-19|Thermo Finnigan Llc|Systems and methods for targeted top down discovery| ES2767375B2|2020-01-24|2020-12-17|Consorcio Centro De Investig Biomedica En Red M P|PROGRAM METHOD, SYSTEM AND PRODUCT FOR THE IDENTIFICATION OF COMPOUNDS IN COMPLEX BIOLOGICAL OR ENVIRONMENTAL SAMPLES|
法律状态:
2020-06-17| BA2A| Patent application published|Ref document number: 2767375 Country of ref document: ES Kind code of ref document: A1 Effective date: 20200617 | 2020-12-17| FG2A| Definitive protection|Ref document number: 2767375 Country of ref document: ES Kind code of ref document: B2 Effective date: 20201217 |
优先权:
[返回顶部]
申请号 | 申请日 | 专利标题 ES202030061A|ES2767375B2|2020-01-24|2020-01-24|PROGRAM METHOD, SYSTEM AND PRODUCT FOR THE IDENTIFICATION OF COMPOUNDS IN COMPLEX BIOLOGICAL OR ENVIRONMENTAL SAMPLES|ES202030061A| ES2767375B2|2020-01-24|2020-01-24|PROGRAM METHOD, SYSTEM AND PRODUCT FOR THE IDENTIFICATION OF COMPOUNDS IN COMPLEX BIOLOGICAL OR ENVIRONMENTAL SAMPLES| PCT/EP2021/051000| WO2021148371A1|2020-01-24|2021-01-19|Method and system for the identification of compounds in complex biological or environmental samples| 相关专利
Sulfonates, polymers, resist compositions and patterning process
Washing machine
Washing machine
Device for fixture finishing and tension adjusting of membrane
Structure for Equipping Band in a Plane Cathode Ray Tube
Process for preparation of 7 alpha-carboxyl 9, 11-epoxy steroids and intermediates useful therein an
国家/地区
|